228 research outputs found

    Analysis of a discrete-layout bimorph disk elements piezoelectric deformable mirror

    Get PDF
    We introduce a discrete-layout bimorph disk elements piezoelectric deformable mirror (DBDEPDM), driven by the circular flexural-mode piezoelectric actuators. We formulated an electromechanical model for analyzing the performance of the new deformable mirror. As a numerical example, a 21-actuators DBDEPDM with an aperture of 165 mm was modeled. The presented results demonstrate that the DBDEPDM has a stroke larger than 10  μm and the resonance frequency is 4.456 kHz. Compared with the conventional piezoelectric deformable mirrors, the DBDEPDM has a larger stroke, higher resonance frequency, and provides higher spatial resolution due to the circular shape of its actuators. Moreover, numerical simulations of influence functions on the model are provided

    Egocentric Hand Detection Via Dynamic Region Growing

    Full text link
    Egocentric videos, which mainly record the activities carried out by the users of the wearable cameras, have drawn much research attentions in recent years. Due to its lengthy content, a large number of ego-related applications have been developed to abstract the captured videos. As the users are accustomed to interacting with the target objects using their own hands while their hands usually appear within their visual fields during the interaction, an egocentric hand detection step is involved in tasks like gesture recognition, action recognition and social interaction understanding. In this work, we propose a dynamic region growing approach for hand region detection in egocentric videos, by jointly considering hand-related motion and egocentric cues. We first determine seed regions that most likely belong to the hand, by analyzing the motion patterns across successive frames. The hand regions can then be located by extending from the seed regions, according to the scores computed for the adjacent superpixels. These scores are derived from four egocentric cues: contrast, location, position consistency and appearance continuity. We discuss how to apply the proposed method in real-life scenarios, where multiple hands irregularly appear and disappear from the videos. Experimental results on public datasets show that the proposed method achieves superior performance compared with the state-of-the-art methods, especially in complicated scenarios

    Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics

    Full text link
    We address the problem of video representation learning without human-annotated labels. While previous efforts address the problem by designing novel self-supervised tasks using video data, the learned features are merely on a frame-by-frame basis, which are not applicable to many video analytic tasks where spatio-temporal features are prevailing. In this paper we propose a novel self-supervised approach to learn spatio-temporal features for video representation. Inspired by the success of two-stream approaches in video classification, we propose to learn visual features by regressing both motion and appearance statistics along spatial and temporal dimensions, given only the input video data. Specifically, we extract statistical concepts (fast-motion region and the corresponding dominant direction, spatio-temporal color diversity, dominant color, etc.) from simple patterns in both spatial and temporal domains. Unlike prior puzzles that are even hard for humans to solve, the proposed approach is consistent with human inherent visual habits and therefore easy to answer. We conduct extensive experiments with C3D to validate the effectiveness of our proposed approach. The experiments show that our approach can significantly improve the performance of C3D when applied to video classification tasks. Code is available at https://github.com/laura-wang/video_repres_mas.Comment: CVPR 201

    Modeling Sketching Primitives to Support Freehand Drawing Based on Context Awareness

    Get PDF
    Freehand drawing is an easy and intuitive method for thinking input and output. In sketch based interface, there lack support for natural sketching with drawing cues, like overlapping, overlooping, hatching, etc. which happen frequently in physical pen and paper. In this paper, we analyze some characters of drawing cues in sketch based interface and describe the different types of sketching primitives. An improved sketch information model is given and the idea is to present and record design thinking during freehand drawing process with individuality and diversification. The interaction model based on context is developed which can guide and help new sketch-based interface development. New applications with different context contents can be easily derived from it and developed further. Our approach can support the tasks that are common across applications, requiring the designer to only provide support for the application-specific tasks. It is capable of and applicable for modeling various sketching interfaces and applications. Finally, we illustrate the general operations of the system by examples in different applications

    Modeling and Design of the Communication Sensing and Control Coupled Closed-Loop Industrial System

    Full text link
    With the advent of 5G era, factories are transitioning towards wireless networks to break free from the limitations of wired networks. In 5G-enabled factories, unmanned automatic devices such as automated guided vehicles and robotic arms complete production tasks cooperatively through the periodic control loops. In such loops, the sensing data is generated by sensors, and transmitted to the control center through uplink wireless communications. The corresponding control commands are generated and sent back to the devices through downlink wireless communications. Since wireless communications, sensing and control are tightly coupled, there are big challenges on the modeling and design of such closed-loop systems. In particular, existing theoretical tools of these functionalities have different modelings and underlying assumptions, which make it difficult for them to collaborate with each other. Therefore, in this paper, an analytical closed-loop model is proposed, where the performances and resources of communication, sensing and control are deeply related. To achieve the optimal control performance, a co-design of communication resource allocation and control method is proposed, inspired by the model predictive control algorithm. Numerical results are provided to demonstrate the relationships between the resources and control performances.Comment: 6 pages, 3 figures, received by GlobeCom 202

    Shunted Self-Attention via Multi-Scale Token Aggregation

    Full text link
    Recent Vision Transformer~(ViT) models have demonstrated encouraging results across various computer vision tasks, thanks to their competence in modeling long-range dependencies of image patches or tokens via self-attention. These models, however, usually designate the similar receptive fields of each token feature within each layer. Such a constraint inevitably limits the ability of each self-attention layer in capturing multi-scale features, thereby leading to performance degradation in handling images with multiple objects of different scales. To address this issue, we propose a novel and generic strategy, termed shunted self-attention~(SSA), that allows ViTs to model the attentions at hybrid scales per attention layer. The key idea of SSA is to inject heterogeneous receptive field sizes into tokens: before computing the self-attention matrix, it selectively merges tokens to represent larger object features while keeping certain tokens to preserve fine-grained features. This novel merging scheme enables the self-attention to learn relationships between objects with different sizes and simultaneously reduces the token numbers and the computational cost. Extensive experiments across various tasks demonstrate the superiority of SSA. Specifically, the SSA-based transformer achieves 84.0\% Top-1 accuracy and outperforms the state-of-the-art Focal Transformer on ImageNet with only half of the model size and computation cost, and surpasses Focal Transformer by 1.3 mAP on COCO and 2.9 mIOU on ADE20K under similar parameter and computation cost. Code has been released at https://github.com/OliverRensu/Shunted-Transformer

    Visualizing the Invisible: Occluded Vehicle Segmentation and Recovery

    Full text link
    In this paper, we propose a novel iterative multi-task framework to complete the segmentation mask of an occluded vehicle and recover the appearance of its invisible parts. In particular, to improve the quality of the segmentation completion, we present two coupled discriminators and introduce an auxiliary 3D model pool for sampling authentic silhouettes as adversarial samples. In addition, we propose a two-path structure with a shared network to enhance the appearance recovery capability. By iteratively performing the segmentation completion and the appearance recovery, the results will be progressively refined. To evaluate our method, we present a dataset, the Occluded Vehicle dataset, containing synthetic and real-world occluded vehicle images. We conduct comparison experiments on this dataset and demonstrate that our model outperforms the state-of-the-art in tasks of recovering segmentation mask and appearance for occluded vehicles. Moreover, we also demonstrate that our appearance recovery approach can benefit the occluded vehicle tracking in real-world videos

    Self-supervised Video Representation Learning by Uncovering Spatio-temporal Statistics

    Get PDF
    This paper proposes a novel pretext task to address the self-supervised video representation learning problem. Specifically, given an unlabeled video clip, we compute a series of spatio-temporal statistical summaries, such as the spatial location and dominant direction of the largest motion, the spatial location and dominant color of the largest color diversity along the temporal axis, etc. Then a neural network is built and trained to yield the statistical summaries given the video frames as inputs. In order to alleviate the learning difficulty, we employ several spatial partitioning patterns to encode rough spatial locations instead of exact spatial Cartesian coordinates. Our approach is inspired by the observation that human visual system is sensitive to rapidly changing contents in the visual field, and only needs impressions about rough spatial locations to understand the visual contents. To validate the effectiveness of the proposed approach, we conduct extensive experiments with four 3D backbone networks, i.e., C3D, 3D-ResNet, R(2+1)D and S3D-G. The results show that our approach outperforms the existing approaches across these backbone networks on four downstream video analysis tasks including action recognition, video retrieval, dynamic scene recognition, and action similarity labeling. The source code is publicly available at: https://github.com/laura-wang/video_repres_sts.Comment: Accepted by TPAMI. An extension of our previous work at arXiv:1904.0359

    Socio-demographic association of multiple modifiable lifestyle risk factors and their clustering in a representative urban population of adults: a cross-sectional study in Hangzhou, China

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>To plan long-term prevention strategies and develop tailored intervention activities, it is important to understand the socio-demographic characteristics of the subpopulations at high risk of developing chronic diseases. This study aimed to examine the socio-demographic characteristics associated with multiple lifestyle risk factors and their clustering.</p> <p>Methods</p> <p>We conducted a simple random sampling survey to assess lifestyle risk factors in three districts of Hangzhou, China between 2008 and 2009. A two-step cluster analysis was used to identify different health-related lifestyle clusters based on tobacco use, physical activity, fruit and vegetable consumption, and out-of-home eating. Multinomial logistic regression was used to model the association between socio-demographic factors and lifestyle clusters.</p> <p>Results</p> <p>A total of 2016 eligible people (977 men and 1039 women, ages 18-64 years) completed the survey. Three distinct clusters were identified from the cluster analysis: an unhealthy (UH) group (25.7%), moderately healthy (MH) group (31.1%), and healthy (H) group (43.1%). UH group was characterised by a high prevalence of current daily smoking, a moderate or low level of PA, low FV consumption with regard to the frequency or servings, and more occurrences of eating out. H group was characterised by no current daily smoking, a moderate level of PA, high FV consumption, and the fewest times of eating out. MH group was characterised by no current daily smoking, a low or high level of PA, and an intermediate level of FV consumption and frequency of eating out. Men were more likely than women to have unhealthy lifestyles. Adults aged 50-64 years were more likely to live healthy lifestyles. Adults aged 40-49 years were more likely to be in the UH group. Adults whose highest level of education was junior high school or below were more likely to be in the UH group. Adults with a high asset index were more likely to be in the MH group.</p> <p>Conclusions</p> <p>This study suggests that Chinese urban people who are middle-aged, men, and less educated are most likely to be part of the cluster with a high-risk profile. Those groups will contribute the most to the future burden of major chronic disease and should be targeted for early prevention programs.</p
    corecore